Overview

Dataset statistics

Number of variables22
Number of observations7109
Missing cells54
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.0 MiB
Average record size in memory731.5 B

Variable types

Categorical13
Numeric9

Alerts

PRT_ID has a high cardinality: 7109 distinct values High cardinality
DATE_SALE has a high cardinality: 2798 distinct values High cardinality
DATE_BUILD has a high cardinality: 5808 distinct values High cardinality
INT_SQFT is highly correlated with N_BEDROOM and 4 other fieldsHigh correlation
N_BEDROOM is highly correlated with INT_SQFT and 2 other fieldsHigh correlation
N_BATHROOM is highly correlated with N_BEDROOM and 1 other fieldsHigh correlation
N_ROOM is highly correlated with INT_SQFT and 5 other fieldsHigh correlation
QS_ROOMS is highly correlated with QS_OVERALLHigh correlation
QS_BATHROOM is highly correlated with QS_OVERALLHigh correlation
QS_BEDROOM is highly correlated with QS_OVERALLHigh correlation
QS_OVERALL is highly correlated with QS_ROOMS and 2 other fieldsHigh correlation
REG_FEE is highly correlated with INT_SQFT and 3 other fieldsHigh correlation
COMMIS is highly correlated with INT_SQFT and 3 other fieldsHigh correlation
SALES_PRICE is highly correlated with INT_SQFT and 3 other fieldsHigh correlation
INT_SQFT is highly correlated with N_BEDROOM and 5 other fieldsHigh correlation
N_BEDROOM is highly correlated with INT_SQFT and 2 other fieldsHigh correlation
N_BATHROOM is highly correlated with INT_SQFT and 2 other fieldsHigh correlation
N_ROOM is highly correlated with INT_SQFT and 5 other fieldsHigh correlation
QS_ROOMS is highly correlated with QS_OVERALLHigh correlation
QS_BATHROOM is highly correlated with QS_OVERALLHigh correlation
QS_BEDROOM is highly correlated with QS_OVERALLHigh correlation
QS_OVERALL is highly correlated with QS_ROOMS and 2 other fieldsHigh correlation
REG_FEE is highly correlated with INT_SQFT and 3 other fieldsHigh correlation
COMMIS is highly correlated with INT_SQFT and 3 other fieldsHigh correlation
SALES_PRICE is highly correlated with INT_SQFT and 3 other fieldsHigh correlation
INT_SQFT is highly correlated with N_BEDROOM and 1 other fieldsHigh correlation
N_BEDROOM is highly correlated with INT_SQFT and 2 other fieldsHigh correlation
N_BATHROOM is highly correlated with N_BEDROOM and 1 other fieldsHigh correlation
N_ROOM is highly correlated with INT_SQFT and 3 other fieldsHigh correlation
REG_FEE is highly correlated with N_ROOM and 1 other fieldsHigh correlation
SALES_PRICE is highly correlated with REG_FEEHigh correlation
N_ROOM is highly correlated with N_BATHROOM and 1 other fieldsHigh correlation
AREA is highly correlated with N_BATHROOMHigh correlation
N_BATHROOM is highly correlated with N_ROOM and 2 other fieldsHigh correlation
N_BEDROOM is highly correlated with N_ROOM and 1 other fieldsHigh correlation
AREA is highly correlated with INT_SQFT and 7 other fieldsHigh correlation
INT_SQFT is highly correlated with AREA and 6 other fieldsHigh correlation
N_BEDROOM is highly correlated with AREA and 3 other fieldsHigh correlation
N_BATHROOM is highly correlated with AREA and 3 other fieldsHigh correlation
N_ROOM is highly correlated with AREA and 6 other fieldsHigh correlation
BUILDTYPE is highly correlated with REG_FEE and 1 other fieldsHigh correlation
MZZONE is highly correlated with AREAHigh correlation
QS_ROOMS is highly correlated with QS_OVERALLHigh correlation
QS_BATHROOM is highly correlated with QS_OVERALLHigh correlation
QS_BEDROOM is highly correlated with QS_OVERALLHigh correlation
QS_OVERALL is highly correlated with QS_ROOMS and 2 other fieldsHigh correlation
REG_FEE is highly correlated with AREA and 5 other fieldsHigh correlation
COMMIS is highly correlated with AREA and 4 other fieldsHigh correlation
SALES_PRICE is highly correlated with AREA and 5 other fieldsHigh correlation
PRT_ID is uniformly distributed Uniform
DATE_BUILD is uniformly distributed Uniform
PRT_ID has unique values Unique

Reproduction

Analysis started2022-01-04 15:55:55.416513
Analysis finished2022-01-04 15:56:17.183970
Duration21.77 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

PRT_ID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct7109
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size437.5 KiB
P07489
 
1
P08284
 
1
P02294
 
1
P04833
 
1
P04815
 
1
Other values (7104)
7104 

Length

Max length6
Median length6
Mean length6
Min length6

Characters and Unicode

Total characters42654
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7109 ?
Unique (%)100.0%

Sample

1st rowP03210
2nd rowP09411
3rd rowP01812
4th rowP05346
5th rowP06210

Common Values

ValueCountFrequency (%)
P074891
 
< 0.1%
P082841
 
< 0.1%
P022941
 
< 0.1%
P048331
 
< 0.1%
P048151
 
< 0.1%
P045971
 
< 0.1%
P038541
 
< 0.1%
P082071
 
< 0.1%
P061191
 
< 0.1%
P084741
 
< 0.1%
Other values (7099)7099
99.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
p041421
 
< 0.1%
p051061
 
< 0.1%
p064101
 
< 0.1%
p045671
 
< 0.1%
p094141
 
< 0.1%
p092221
 
< 0.1%
p041181
 
< 0.1%
p028341
 
< 0.1%
p055701
 
< 0.1%
p037741
 
< 0.1%
Other values (7099)7099
99.9%

Most occurring characters

ValueCountFrequency (%)
09965
23.4%
P7109
16.7%
12923
 
6.9%
62866
 
6.7%
72859
 
6.7%
52856
 
6.7%
32824
 
6.6%
92823
 
6.6%
82820
 
6.6%
22813
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number35545
83.3%
Uppercase Letter7109
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
09965
28.0%
12923
 
8.2%
62866
 
8.1%
72859
 
8.0%
52856
 
8.0%
32824
 
7.9%
92823
 
7.9%
82820
 
7.9%
22813
 
7.9%
42796
 
7.9%
Uppercase Letter
ValueCountFrequency (%)
P7109
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common35545
83.3%
Latin7109
 
16.7%

Most frequent character per script

Common
ValueCountFrequency (%)
09965
28.0%
12923
 
8.2%
62866
 
8.1%
72859
 
8.0%
52856
 
8.0%
32824
 
7.9%
92823
 
7.9%
82820
 
7.9%
22813
 
7.9%
42796
 
7.9%
Latin
ValueCountFrequency (%)
P7109
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII42654
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
09965
23.4%
P7109
16.7%
12923
 
6.9%
62866
 
6.7%
72859
 
6.7%
52856
 
6.7%
32824
 
6.6%
92823
 
6.6%
82820
 
6.6%
22813
 
6.6%

AREA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct17
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size453.8 KiB
Chrompet
1681 
Karapakkam
1363 
KK Nagar
996 
Velachery
979 
Anna Nagar
783 
Other values (12)
1307 

Length

Max length10
Median length8
Mean length8.342382895
Min length4

Characters and Unicode

Total characters59306
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowKarapakkam
2nd rowAnna Nagar
3rd rowAdyar
4th rowVelachery
5th rowKarapakkam

Common Values

ValueCountFrequency (%)
Chrompet1681
23.6%
Karapakkam1363
19.2%
KK Nagar996
14.0%
Velachery979
13.8%
Anna Nagar783
11.0%
Adyar773
10.9%
T Nagar496
 
7.0%
Chrompt9
 
0.1%
Chormpet6
 
0.1%
Chrmpet6
 
0.1%
Other values (7)17
 
0.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
nagar2280
24.3%
chrompet1681
17.9%
karapakkam1363
14.5%
kk996
10.6%
velachery979
10.4%
anna783
 
8.3%
adyar773
 
8.2%
t496
 
5.3%
chrompt9
 
0.1%
chormpet6
 
0.1%
Other values (8)23
 
0.2%

Most occurring characters

ValueCountFrequency (%)
a12574
21.2%
r7109
12.0%
e3655
 
6.2%
K3360
 
5.7%
m3068
 
5.2%
p3068
 
5.2%
k2729
 
4.6%
h2683
 
4.5%
g2286
 
3.9%
N2286
 
3.9%
Other values (12)16488
27.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter46634
78.6%
Uppercase Letter10392
 
17.5%
Space Separator2280
 
3.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a12574
27.0%
r7109
15.2%
e3655
 
7.8%
m3068
 
6.6%
p3068
 
6.6%
k2729
 
5.9%
h2683
 
5.8%
g2286
 
4.9%
y1755
 
3.8%
t1702
 
3.6%
Other values (5)6005
12.9%
Uppercase Letter
ValueCountFrequency (%)
K3360
32.3%
N2286
22.0%
C1702
16.4%
A1562
15.0%
V981
 
9.4%
T501
 
4.8%
Space Separator
ValueCountFrequency (%)
2280
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin57026
96.2%
Common2280
 
3.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a12574
22.0%
r7109
12.5%
e3655
 
6.4%
K3360
 
5.9%
m3068
 
5.4%
p3068
 
5.4%
k2729
 
4.8%
h2683
 
4.7%
g2286
 
4.0%
N2286
 
4.0%
Other values (11)14208
24.9%
Common
ValueCountFrequency (%)
2280
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII59306
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a12574
21.2%
r7109
12.0%
e3655
 
6.2%
K3360
 
5.7%
m3068
 
5.2%
p3068
 
5.2%
k2729
 
4.6%
h2683
 
4.5%
g2286
 
3.9%
N2286
 
3.9%
Other values (12)16488
27.8%

INT_SQFT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1699
Distinct (%)23.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1382.073006
Minimum500
Maximum2500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum500
5-th percentile702
Q1993
median1373
Q31744
95-th percentile2084.6
Maximum2500
Range2000
Interquartile range (IQR)751

Descriptive statistics

Standard deviation457.4109025
Coefficient of variation (CV)0.3309600147
Kurtosis-0.8863792596
Mean1382.073006
Median Absolute Deviation (MAD)376
Skewness0.1312376308
Sum9825157
Variance209224.7337
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178118
 
0.3%
153815
 
0.2%
150513
 
0.2%
151413
 
0.2%
78612
 
0.2%
163412
 
0.2%
96112
 
0.2%
165512
 
0.2%
108112
 
0.2%
182311
 
0.2%
Other values (1689)6979
98.2%
ValueCountFrequency (%)
5003
< 0.1%
5012
< 0.1%
5021
 
< 0.1%
5042
< 0.1%
5051
 
< 0.1%
5061
 
< 0.1%
5072
< 0.1%
5084
0.1%
5102
< 0.1%
5111
 
< 0.1%
ValueCountFrequency (%)
25001
 
< 0.1%
24991
 
< 0.1%
24981
 
< 0.1%
24971
 
< 0.1%
24963
< 0.1%
24952
< 0.1%
24941
 
< 0.1%
24931
 
< 0.1%
24921
 
< 0.1%
24911
 
< 0.1%

DATE_SALE
Categorical

HIGH CARDINALITY

Distinct2798
Distinct (%)39.4%
Missing0
Missing (%)0.0%
Memory size465.3 KiB
06-10-2009
 
12
26-02-2012
 
10
17-11-2010
 
10
06-01-2009
 
10
15-03-2012
 
10
Other values (2793)
7057 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters71090
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique988 ?
Unique (%)13.9%

Sample

1st row04-05-2011
2nd row19-12-2006
3rd row04-02-2012
4th row13-03-2010
5th row05-10-2009

Common Values

ValueCountFrequency (%)
06-10-200912
 
0.2%
26-02-201210
 
0.1%
17-11-201010
 
0.1%
06-01-200910
 
0.1%
15-03-201210
 
0.1%
12-04-201110
 
0.1%
19-07-20119
 
0.1%
14-08-20109
 
0.1%
30-11-20099
 
0.1%
13-03-20109
 
0.1%
Other values (2788)7011
98.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
06-10-200912
 
0.2%
26-02-201210
 
0.1%
17-11-201010
 
0.1%
06-01-200910
 
0.1%
15-03-201210
 
0.1%
12-04-201110
 
0.1%
13-03-20109
 
0.1%
01-04-20099
 
0.1%
17-06-20119
 
0.1%
28-02-20129
 
0.1%
Other values (2788)7011
98.6%

Most occurring characters

ValueCountFrequency (%)
020265
28.5%
-14218
20.0%
211858
16.7%
111645
16.4%
92365
 
3.3%
82087
 
2.9%
32060
 
2.9%
71904
 
2.7%
41715
 
2.4%
61527
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number56872
80.0%
Dash Punctuation14218
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
020265
35.6%
211858
20.9%
111645
20.5%
92365
 
4.2%
82087
 
3.7%
32060
 
3.6%
71904
 
3.3%
41715
 
3.0%
61527
 
2.7%
51446
 
2.5%
Dash Punctuation
ValueCountFrequency (%)
-14218
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common71090
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
020265
28.5%
-14218
20.0%
211858
16.7%
111645
16.4%
92365
 
3.3%
82087
 
2.9%
32060
 
2.9%
71904
 
2.7%
41715
 
2.4%
61527
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII71090
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
020265
28.5%
-14218
20.0%
211858
16.7%
111645
16.4%
92365
 
3.3%
82087
 
2.9%
32060
 
2.9%
71904
 
2.7%
41715
 
2.4%
61527
 
2.1%

DIST_MAINROAD
Real number (ℝ≥0)

Distinct201
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean99.60317907
Minimum0
Maximum200
Zeros33
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum0
5-th percentile10
Q150
median99
Q3148
95-th percentile190
Maximum200
Range200
Interquartile range (IQR)98

Descriptive statistics

Standard deviation57.40310959
Coefficient of variation (CV)0.5763180465
Kurtosis-1.165240378
Mean99.60317907
Median Absolute Deviation (MAD)49
Skewness0.01814383556
Sum708079
Variance3295.11699
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3956
 
0.8%
5153
 
0.7%
7852
 
0.7%
7749
 
0.7%
15648
 
0.7%
1448
 
0.7%
7348
 
0.7%
4947
 
0.7%
11147
 
0.7%
446
 
0.6%
Other values (191)6615
93.1%
ValueCountFrequency (%)
033
0.5%
128
0.4%
244
0.6%
327
0.4%
446
0.6%
536
0.5%
642
0.6%
727
0.4%
831
0.4%
937
0.5%
ValueCountFrequency (%)
20038
0.5%
19930
0.4%
19830
0.4%
19738
0.5%
19636
0.5%
19534
0.5%
19435
0.5%
19329
0.4%
19236
0.5%
19140
0.6%

N_BEDROOM
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.1%
Missing1
Missing (%)< 0.1%
Memory size416.6 KiB
1.0
3795 
2.0
2352 
3.0
707 
4.0
 
254

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters21324
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row2.0
3rd row1.0
4th row3.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.03795
53.4%
2.02352
33.1%
3.0707
 
9.9%
4.0254
 
3.6%
(Missing)1
 
< 0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1.03795
53.4%
2.02352
33.1%
3.0707
 
9.9%
4.0254
 
3.6%

Most occurring characters

ValueCountFrequency (%)
07108
33.3%
.7108
33.3%
13795
17.8%
22352
 
11.0%
3707
 
3.3%
4254
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number14216
66.7%
Other Punctuation7108
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
07108
50.0%
13795
26.7%
22352
 
16.5%
3707
 
5.0%
4254
 
1.8%
Other Punctuation
ValueCountFrequency (%)
.7108
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common21324
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
07108
33.3%
.7108
33.3%
13795
17.8%
22352
 
11.0%
3707
 
3.3%
4254
 
1.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII21324
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
07108
33.3%
.7108
33.3%
13795
17.8%
22352
 
11.0%
3707
 
3.3%
4254
 
1.2%

N_BATHROOM
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing5
Missing (%)0.1%
Memory size416.6 KiB
1.0
5589 
2.0
1515 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters21312
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row2.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.05589
78.6%
2.01515
 
21.3%
(Missing)5
 
0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1.05589
78.7%
2.01515
 
21.3%

Most occurring characters

ValueCountFrequency (%)
07104
33.3%
.7104
33.3%
15589
26.2%
21515
 
7.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number14208
66.7%
Other Punctuation7104
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
07104
50.0%
15589
39.3%
21515
 
10.7%
Other Punctuation
ValueCountFrequency (%)
.7104
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common21312
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
07104
33.3%
.7104
33.3%
15589
26.2%
21515
 
7.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII21312
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
07104
33.3%
.7104
33.3%
15589
26.2%
21515
 
7.1%

N_ROOM
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size402.8 KiB
4
2563 
3
2125 
5
1246 
2
921 
6
 
254

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters7109
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row5
3rd row3
4th row5
5th row3

Common Values

ValueCountFrequency (%)
42563
36.1%
32125
29.9%
51246
17.5%
2921
 
13.0%
6254
 
3.6%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
42563
36.1%
32125
29.9%
51246
17.5%
2921
 
13.0%
6254
 
3.6%

Most occurring characters

ValueCountFrequency (%)
42563
36.1%
32125
29.9%
51246
17.5%
2921
 
13.0%
6254
 
3.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number7109
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
42563
36.1%
32125
29.9%
51246
17.5%
2921
 
13.0%
6254
 
3.6%

Most occurring scripts

ValueCountFrequency (%)
Common7109
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
42563
36.1%
32125
29.9%
51246
17.5%
2921
 
13.0%
6254
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII7109
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
42563
36.1%
32125
29.9%
51246
17.5%
2921
 
13.0%
6254
 
3.6%

SALE_COND
Categorical

Distinct9
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size450.0 KiB
AdjLand
1433 
Partial
1429 
Normal Sale
1423 
AbNormal
1406 
Family
1403 
Other values (4)
 
15

Length

Max length11
Median length7
Mean length7.803910536
Min length6

Characters and Unicode

Total characters55478
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowAbNormal
2nd rowAbNormal
3rd rowAbNormal
4th rowFamily
5th rowAbNormal

Common Values

ValueCountFrequency (%)
AdjLand1433
20.2%
Partial1429
20.1%
Normal Sale1423
20.0%
AbNormal1406
19.8%
Family1403
19.7%
Adj Land6
 
0.1%
Ab Normal5
 
0.1%
Partiall3
 
< 0.1%
PartiaLl1
 
< 0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
adjland1433
16.8%
partial1429
16.7%
normal1428
16.7%
sale1423
16.7%
abnormal1406
16.5%
family1403
16.4%
land6
 
0.1%
adj6
 
0.1%
ab5
 
0.1%
partiall4
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a9965
18.0%
l7096
12.8%
r4267
 
7.7%
m4237
 
7.6%
d2878
 
5.2%
A2850
 
5.1%
i2836
 
5.1%
o2834
 
5.1%
N2834
 
5.1%
L1440
 
2.6%
Other values (10)14241
25.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter42661
76.9%
Uppercase Letter11383
 
20.5%
Space Separator1434
 
2.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a9965
23.4%
l7096
16.6%
r4267
10.0%
m4237
9.9%
d2878
 
6.7%
i2836
 
6.6%
o2834
 
6.6%
n1439
 
3.4%
j1439
 
3.4%
t1433
 
3.4%
Other values (3)4237
9.9%
Uppercase Letter
ValueCountFrequency (%)
A2850
25.0%
N2834
24.9%
L1440
12.7%
P1433
12.6%
S1423
12.5%
F1403
12.3%
Space Separator
ValueCountFrequency (%)
1434
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin54044
97.4%
Common1434
 
2.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
a9965
18.4%
l7096
13.1%
r4267
 
7.9%
m4237
 
7.8%
d2878
 
5.3%
A2850
 
5.3%
i2836
 
5.2%
o2834
 
5.2%
N2834
 
5.2%
L1440
 
2.7%
Other values (9)12807
23.7%
Common
ValueCountFrequency (%)
1434
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII55478
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a9965
18.0%
l7096
12.8%
r4267
 
7.7%
m4237
 
7.6%
d2878
 
5.2%
A2850
 
5.1%
i2836
 
5.1%
o2834
 
5.1%
N2834
 
5.1%
L1440
 
2.6%
Other values (10)14241
25.7%

PARK_FACIL
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size413.2 KiB
Yes
3587 
No
3520 
Noo
 
2

Length

Max length3
Median length3
Mean length2.504853003
Min length2

Characters and Unicode

Total characters17807
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowYes
2nd rowNo
3rd rowYes
4th rowNo
5th rowYes

Common Values

ValueCountFrequency (%)
Yes3587
50.5%
No3520
49.5%
Noo2
 
< 0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
yes3587
50.5%
no3520
49.5%
noo2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
s3587
20.1%
e3587
20.1%
Y3587
20.1%
o3524
19.8%
N3522
19.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10698
60.1%
Uppercase Letter7109
39.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s3587
33.5%
e3587
33.5%
o3524
32.9%
Uppercase Letter
ValueCountFrequency (%)
Y3587
50.5%
N3522
49.5%

Most occurring scripts

ValueCountFrequency (%)
Latin17807
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s3587
20.1%
e3587
20.1%
Y3587
20.1%
o3524
19.8%
N3522
19.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII17807
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s3587
20.1%
e3587
20.1%
Y3587
20.1%
o3524
19.8%
N3522
19.8%

DATE_BUILD
Categorical

HIGH CARDINALITY
UNIFORM

Distinct5808
Distinct (%)81.7%
Missing0
Missing (%)0.0%
Memory size465.3 KiB
02-07-1987
 
6
04-04-1999
 
5
13-05-1982
 
4
19-02-1979
 
4
29-01-1982
 
4
Other values (5803)
7086 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters71090
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4676 ?
Unique (%)65.8%

Sample

1st row15-05-1967
2nd row22-12-1995
3rd row09-02-1992
4th row18-03-1988
5th row13-10-1979

Common Values

ValueCountFrequency (%)
02-07-19876
 
0.1%
04-04-19995
 
0.1%
13-05-19824
 
0.1%
19-02-19794
 
0.1%
29-01-19824
 
0.1%
03-01-19794
 
0.1%
16-01-20034
 
0.1%
02-10-19904
 
0.1%
02-12-19824
 
0.1%
27-08-20004
 
0.1%
Other values (5798)7066
99.4%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
02-07-19876
 
0.1%
04-04-19995
 
0.1%
18-09-19714
 
0.1%
08-04-19894
 
0.1%
23-01-19874
 
0.1%
21-11-19924
 
0.1%
06-12-19854
 
0.1%
17-01-19964
 
0.1%
19-07-19774
 
0.1%
14-03-19854
 
0.1%
Other values (5798)7066
99.4%

Most occurring characters

ValueCountFrequency (%)
-14218
20.0%
113134
18.5%
011553
16.3%
99749
13.7%
26084
8.6%
83790
 
5.3%
73407
 
4.8%
62574
 
3.6%
32343
 
3.3%
52256
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number56872
80.0%
Dash Punctuation14218
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
113134
23.1%
011553
20.3%
99749
17.1%
26084
10.7%
83790
 
6.7%
73407
 
6.0%
62574
 
4.5%
32343
 
4.1%
52256
 
4.0%
41982
 
3.5%
Dash Punctuation
ValueCountFrequency (%)
-14218
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common71090
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
-14218
20.0%
113134
18.5%
011553
16.3%
99749
13.7%
26084
8.6%
83790
 
5.3%
73407
 
4.8%
62574
 
3.6%
32343
 
3.3%
52256
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII71090
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
-14218
20.0%
113134
18.5%
011553
16.3%
99749
13.7%
26084
8.6%
83790
 
5.3%
73407
 
4.8%
62574
 
3.6%
32343
 
3.3%
52256
 
3.2%

BUILDTYPE
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size444.2 KiB
House
2444 
Commercial
2325 
Others
2310 
Other
 
26
Comercial
 
4

Length

Max length10
Median length6
Mean length6.962441975
Min length5

Characters and Unicode

Total characters49496
Distinct characters15
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCommercial
2nd rowCommercial
3rd rowCommercial
4th rowOthers
5th rowOthers

Common Values

ValueCountFrequency (%)
House2444
34.4%
Commercial2325
32.7%
Others2310
32.5%
Other26
 
0.4%
Comercial4
 
0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
house2444
34.4%
commercial2325
32.7%
others2310
32.5%
other26
 
0.4%
comercial4
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e7109
14.4%
o4773
9.6%
s4754
9.6%
r4665
9.4%
m4654
 
9.4%
u2444
 
4.9%
H2444
 
4.9%
h2336
 
4.7%
t2336
 
4.7%
O2336
 
4.7%
Other values (5)11645
23.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter42387
85.6%
Uppercase Letter7109
 
14.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e7109
16.8%
o4773
11.3%
s4754
11.2%
r4665
11.0%
m4654
11.0%
u2444
 
5.8%
h2336
 
5.5%
t2336
 
5.5%
l2329
 
5.5%
a2329
 
5.5%
Other values (2)4658
11.0%
Uppercase Letter
ValueCountFrequency (%)
H2444
34.4%
O2336
32.9%
C2329
32.8%

Most occurring scripts

ValueCountFrequency (%)
Latin49496
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e7109
14.4%
o4773
9.6%
s4754
9.6%
r4665
9.4%
m4654
 
9.4%
u2444
 
4.9%
H2444
 
4.9%
h2336
 
4.7%
t2336
 
4.7%
O2336
 
4.7%
Other values (5)11645
23.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII49496
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e7109
14.4%
o4773
9.6%
s4754
9.6%
r4665
9.4%
m4654
 
9.4%
u2444
 
4.9%
H2444
 
4.9%
h2336
 
4.7%
t2336
 
4.7%
O2336
 
4.7%
Other values (5)11645
23.5%

UTILITY_AVAIL
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size434.8 KiB
AllPub
1886 
NoSeWa
1871 
NoSewr
1829 
ELO
1522 
All Pub
 
1

Length

Max length7
Median length6
Mean length5.615135743
Min length3

Characters and Unicode

Total characters39918
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowAllPub
2nd rowAllPub
3rd rowELO
4th rowNoSewr
5th rowAllPub

Common Values

ValueCountFrequency (%)
AllPub1886
26.5%
NoSeWa1871
26.3%
NoSewr 1829
25.7%
ELO1522
21.4%
All Pub1
 
< 0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
allpub1886
26.5%
nosewa1871
26.3%
nosewr1829
25.7%
elo1522
21.4%
pub1
 
< 0.1%
all1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
l3774
 
9.5%
e3700
 
9.3%
S3700
 
9.3%
N3700
 
9.3%
o3700
 
9.3%
P1887
 
4.7%
u1887
 
4.7%
b1887
 
4.7%
A1887
 
4.7%
W1871
 
4.7%
Other values (7)11925
29.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter20477
51.3%
Uppercase Letter17611
44.1%
Space Separator1830
 
4.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l3774
18.4%
e3700
18.1%
o3700
18.1%
u1887
9.2%
b1887
9.2%
a1871
9.1%
w1829
8.9%
r1829
8.9%
Uppercase Letter
ValueCountFrequency (%)
S3700
21.0%
N3700
21.0%
P1887
10.7%
A1887
10.7%
W1871
10.6%
L1522
8.6%
E1522
8.6%
O1522
8.6%
Space Separator
ValueCountFrequency (%)
1830
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin38088
95.4%
Common1830
 
4.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
l3774
 
9.9%
e3700
 
9.7%
S3700
 
9.7%
N3700
 
9.7%
o3700
 
9.7%
P1887
 
5.0%
u1887
 
5.0%
b1887
 
5.0%
A1887
 
5.0%
W1871
 
4.9%
Other values (6)10095
26.5%
Common
ValueCountFrequency (%)
1830
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII39918
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l3774
 
9.5%
e3700
 
9.3%
S3700
 
9.3%
N3700
 
9.3%
o3700
 
9.3%
P1887
 
4.7%
u1887
 
4.7%
b1887
 
4.7%
A1887
 
4.7%
W1871
 
4.7%
Other values (7)11925
29.9%

STREET
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size440.9 KiB
Paved
2560 
Gravel
2520 
No Access
2010 
Pavd
 
12
NoAccess
 
7

Length

Max length9
Median length6
Mean length6.486706991
Min length4

Characters and Unicode

Total characters46114
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPaved
2nd rowGravel
3rd rowGravel
4th rowPaved
5th rowGravel

Common Values

ValueCountFrequency (%)
Paved2560
36.0%
Gravel2520
35.4%
No Access2010
28.3%
Pavd12
 
0.2%
NoAccess7
 
0.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
paved2560
28.1%
gravel2520
27.6%
access2010
22.0%
no2010
22.0%
pavd12
 
0.1%
noaccess7
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e7097
15.4%
v5092
11.0%
a5092
11.0%
s4034
8.7%
c4034
8.7%
d2572
 
5.6%
P2572
 
5.6%
l2520
 
5.5%
r2520
 
5.5%
G2520
 
5.5%
Other values (4)8061
17.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter34978
75.9%
Uppercase Letter9126
 
19.8%
Space Separator2010
 
4.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e7097
20.3%
v5092
14.6%
a5092
14.6%
s4034
11.5%
c4034
11.5%
d2572
 
7.4%
l2520
 
7.2%
r2520
 
7.2%
o2017
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
P2572
28.2%
G2520
27.6%
A2017
22.1%
N2017
22.1%
Space Separator
ValueCountFrequency (%)
2010
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin44104
95.6%
Common2010
 
4.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e7097
16.1%
v5092
11.5%
a5092
11.5%
s4034
9.1%
c4034
9.1%
d2572
 
5.8%
P2572
 
5.8%
l2520
 
5.7%
r2520
 
5.7%
G2520
 
5.7%
Other values (3)6051
13.7%
Common
ValueCountFrequency (%)
2010
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII46114
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e7097
15.4%
v5092
11.0%
a5092
11.0%
s4034
8.7%
c4034
8.7%
d2572
 
5.6%
P2572
 
5.6%
l2520
 
5.5%
r2520
 
5.5%
G2520
 
5.5%
Other values (4)8061
17.5%

MZZONE
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size408.2 KiB
RL
1858 
RH
1822 
RM
1817 
C
550 
A
537 

Length

Max length2
Median length2
Mean length1.773245182
Min length1

Characters and Unicode

Total characters12606
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowRH
3rd rowRL
4th rowI
5th rowC

Common Values

ValueCountFrequency (%)
RL1858
26.1%
RH1822
25.6%
RM1817
25.6%
C550
 
7.7%
A537
 
7.6%
I525
 
7.4%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
rl1858
26.1%
rh1822
25.6%
rm1817
25.6%
c550
 
7.7%
a537
 
7.6%
i525
 
7.4%

Most occurring characters

ValueCountFrequency (%)
R5497
43.6%
L1858
 
14.7%
H1822
 
14.5%
M1817
 
14.4%
C550
 
4.4%
A537
 
4.3%
I525
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter12606
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R5497
43.6%
L1858
 
14.7%
H1822
 
14.5%
M1817
 
14.4%
C550
 
4.4%
A537
 
4.3%
I525
 
4.2%

Most occurring scripts

ValueCountFrequency (%)
Latin12606
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
R5497
43.6%
L1858
 
14.7%
H1822
 
14.5%
M1817
 
14.4%
C550
 
4.4%
A537
 
4.3%
I525
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII12606
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R5497
43.6%
L1858
 
14.7%
H1822
 
14.5%
M1817
 
14.4%
C550
 
4.4%
A537
 
4.3%
I525
 
4.2%

QS_ROOMS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct31
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.517470812
Minimum2
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2
5-th percentile2.1
Q12.7
median3.5
Q34.3
95-th percentile4.9
Maximum5
Range3
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation0.8919724311
Coefficient of variation (CV)0.2535834635
Kurtosis-1.197535123
Mean3.517470812
Median Absolute Deviation (MAD)0.8
Skewness-0.01895704371
Sum25005.7
Variance0.7956148178
MonotonicityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
2.5265
 
3.7%
3.8259
 
3.6%
3.6255
 
3.6%
4.6252
 
3.5%
3.9245
 
3.4%
4.9242
 
3.4%
3.4240
 
3.4%
4.7239
 
3.4%
4.8239
 
3.4%
4.2239
 
3.4%
Other values (21)4634
65.2%
ValueCountFrequency (%)
2203
2.9%
2.1236
3.3%
2.2213
3.0%
2.3224
3.2%
2.4208
2.9%
2.5265
3.7%
2.6237
3.3%
2.7200
2.8%
2.8226
3.2%
2.9220
3.1%
ValueCountFrequency (%)
5228
3.2%
4.9242
3.4%
4.8239
3.4%
4.7239
3.4%
4.6252
3.5%
4.5218
3.1%
4.4219
3.1%
4.3225
3.2%
4.2239
3.4%
4.1222
3.1%

QS_BATHROOM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct31
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.507244338
Minimum2
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2
5-th percentile2.1
Q12.7
median3.5
Q34.3
95-th percentile4.9
Maximum5
Range3
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation0.8978337054
Coefficient of variation (CV)0.2559940565
Kurtosis-1.21625135
Mean3.507244338
Median Absolute Deviation (MAD)0.8
Skewness0.0003104318578
Sum24933
Variance0.8061053625
MonotonicityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
2.7256
 
3.6%
4.8255
 
3.6%
3.7251
 
3.5%
4.7247
 
3.5%
4.9245
 
3.4%
3241
 
3.4%
4.2237
 
3.3%
4.6234
 
3.3%
2.2234
 
3.3%
3.4234
 
3.3%
Other values (21)4675
65.8%
ValueCountFrequency (%)
2222
3.1%
2.1224
3.2%
2.2234
3.3%
2.3220
3.1%
2.4230
3.2%
2.5233
3.3%
2.6226
3.2%
2.7256
3.6%
2.8206
2.9%
2.9228
3.2%
ValueCountFrequency (%)
5219
3.1%
4.9245
3.4%
4.8255
3.6%
4.7247
3.5%
4.6234
3.3%
4.5231
3.2%
4.4219
3.1%
4.3224
3.2%
4.2237
3.3%
4.1210
3.0%

QS_BEDROOM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct31
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.485300324
Minimum2
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2
5-th percentile2.1
Q12.7
median3.5
Q34.3
95-th percentile4.9
Maximum5
Range3
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation0.8872664105
Coefficient of variation (CV)0.2545738755
Kurtosis-1.190165265
Mean3.485300324
Median Absolute Deviation (MAD)0.8
Skewness0.01728160906
Sum24777
Variance0.7872416831
MonotonicityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
2.6273
 
3.8%
3.2253
 
3.6%
4248
 
3.5%
3.8244
 
3.4%
2.4244
 
3.4%
3.1243
 
3.4%
2.1242
 
3.4%
3241
 
3.4%
3.4239
 
3.4%
2.2237
 
3.3%
Other values (21)4645
65.3%
ValueCountFrequency (%)
2221
3.1%
2.1242
3.4%
2.2237
3.3%
2.3200
2.8%
2.4244
3.4%
2.5226
3.2%
2.6273
3.8%
2.7222
3.1%
2.8210
3.0%
2.9219
3.1%
ValueCountFrequency (%)
5217
3.1%
4.9203
2.9%
4.8211
3.0%
4.7228
3.2%
4.6233
3.3%
4.5227
3.2%
4.4237
3.3%
4.3237
3.3%
4.2212
3.0%
4.1223
3.1%

QS_OVERALL
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct479
Distinct (%)6.8%
Missing48
Missing (%)0.7%
Infinite0
Infinite (%)0.0%
Mean3.503253788
Minimum2
Maximum4.97
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2
5-th percentile2.63
Q13.13
median3.5
Q33.89
95-th percentile4.37
Maximum4.97
Range2.97
Interquartile range (IQR)0.76

Descriptive statistics

Standard deviation0.5272229035
Coefficient of variation (CV)0.1504952068
Kurtosis-0.4896687645
Mean3.503253788
Median Absolute Deviation (MAD)0.38
Skewness-0.007263226359
Sum24736.475
Variance0.27796399
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.5459
 
0.8%
3.2657
 
0.8%
3.3256
 
0.8%
3.5655
 
0.8%
3.3654
 
0.8%
3.3453
 
0.7%
3.251
 
0.7%
3.4751
 
0.7%
3.9651
 
0.7%
3.4950
 
0.7%
Other values (469)6524
91.8%
ValueCountFrequency (%)
21
 
< 0.1%
2.062
 
< 0.1%
2.091
 
< 0.1%
2.111
 
< 0.1%
2.183
< 0.1%
2.1951
 
< 0.1%
2.21
 
< 0.1%
2.214
0.1%
2.225
0.1%
2.231
 
< 0.1%
ValueCountFrequency (%)
4.971
< 0.1%
4.951
< 0.1%
4.941
< 0.1%
4.931
< 0.1%
4.91
< 0.1%
4.871
< 0.1%
4.8651
< 0.1%
4.851
< 0.1%
4.832
< 0.1%
4.821
< 0.1%

REG_FEE
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7038
Distinct (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean376938.3307
Minimum71177
Maximum983922
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum71177
5-th percentile197984.6
Q1272406
median349486
Q3451562
95-th percentile669167.4
Maximum983922
Range912745
Interquartile range (IQR)179156

Descriptive statistics

Standard deviation143070.662
Coefficient of variation (CV)0.3795598652
Kurtosis1.126499412
Mean376938.3307
Median Absolute Deviation (MAD)85998
Skewness1.037754561
Sum2679654593
Variance2.046921433 × 1010
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2352293
 
< 0.1%
2579172
 
< 0.1%
4195522
 
< 0.1%
3186912
 
< 0.1%
2910852
 
< 0.1%
3480342
 
< 0.1%
4243612
 
< 0.1%
2764232
 
< 0.1%
2894512
 
< 0.1%
5156962
 
< 0.1%
Other values (7028)7088
99.7%
ValueCountFrequency (%)
711771
< 0.1%
957981
< 0.1%
1039281
< 0.1%
1064661
< 0.1%
1113661
< 0.1%
1116901
< 0.1%
1129601
< 0.1%
1137591
< 0.1%
1140111
< 0.1%
1142101
< 0.1%
ValueCountFrequency (%)
9839221
< 0.1%
9811171
< 0.1%
9630291
< 0.1%
9524111
< 0.1%
9471241
< 0.1%
9428591
< 0.1%
9415671
< 0.1%
9408131
< 0.1%
9363141
< 0.1%
9312241
< 0.1%

COMMIS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7011
Distinct (%)98.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean141005.7265
Minimum5055
Maximum495405
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum5055
5-th percentile35990.6
Q184219
median127628
Q3184506
95-th percentile292538
Maximum495405
Range490350
Interquartile range (IQR)100287

Descriptive statistics

Standard deviation78768.09372
Coefficient of variation (CV)0.558616275
Kurtosis1.073363345
Mean141005.7265
Median Absolute Deviation (MAD)49095
Skewness0.9516562165
Sum1002409710
Variance6204412588
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1178253
 
< 0.1%
1271892
 
< 0.1%
1032242
 
< 0.1%
1779752
 
< 0.1%
2236202
 
< 0.1%
1205982
 
< 0.1%
884032
 
< 0.1%
759622
 
< 0.1%
692582
 
< 0.1%
1492572
 
< 0.1%
Other values (7001)7088
99.7%
ValueCountFrequency (%)
50551
< 0.1%
51261
< 0.1%
53781
< 0.1%
56201
< 0.1%
59431
< 0.1%
60381
< 0.1%
61491
< 0.1%
61901
< 0.1%
62361
< 0.1%
63491
< 0.1%
ValueCountFrequency (%)
4954051
< 0.1%
4919611
< 0.1%
4859241
< 0.1%
4810011
< 0.1%
4792971
< 0.1%
4757951
< 0.1%
4712471
< 0.1%
4707841
< 0.1%
4699201
< 0.1%
4661561
< 0.1%

SALES_PRICE
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct7057
Distinct (%)99.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10894909.64
Minimum2156875
Maximum23667340
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2156875
5-th percentile5630100
Q18272100
median10335050
Q312993900
95-th percentile18790428
Maximum23667340
Range21510465
Interquartile range (IQR)4721800

Descriptive statistics

Standard deviation3768603.457
Coefficient of variation (CV)0.345904976
Kurtosis0.5881293416
Mean10894909.64
Median Absolute Deviation (MAD)2317605
Skewness0.7733433359
Sum7.745191262 × 1010
Variance1.420237202 × 1013
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
45170002
 
< 0.1%
123701002
 
< 0.1%
84162002
 
< 0.1%
69885002
 
< 0.1%
65190002
 
< 0.1%
67097502
 
< 0.1%
81912502
 
< 0.1%
76297502
 
< 0.1%
66770002
 
< 0.1%
130980302
 
< 0.1%
Other values (7047)7089
99.7%
ValueCountFrequency (%)
21568751
< 0.1%
24763751
< 0.1%
26402501
< 0.1%
27972501
< 0.1%
29397501
< 0.1%
30003751
< 0.1%
30012501
< 0.1%
30135001
< 0.1%
30297501
< 0.1%
30813751
< 0.1%
ValueCountFrequency (%)
236673401
< 0.1%
234078601
< 0.1%
233145801
< 0.1%
233070001
< 0.1%
232475901
< 0.1%
230135001
< 0.1%
229185001
< 0.1%
229165001
< 0.1%
228528901
< 0.1%
228291301
< 0.1%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PRT_IDAREAINT_SQFTDATE_SALEDIST_MAINROADN_BEDROOMN_BATHROOMN_ROOMSALE_CONDPARK_FACILDATE_BUILDBUILDTYPEUTILITY_AVAILSTREETMZZONEQS_ROOMSQS_BATHROOMQS_BEDROOMQS_OVERALLREG_FEECOMMISSALES_PRICE
0P03210Karapakkam100404-05-20111311.01.03AbNormalYes15-05-1967CommercialAllPubPavedA4.03.94.94.3303800001444007600000
1P09411Anna Nagar198619-12-2006262.01.05AbNormalNo22-12-1995CommercialAllPubGravelRH4.94.22.53.76576012230404921717770
2P01812Adyar90904-02-2012701.01.03AbNormalYes09-02-1992CommercialELOGravelRL4.13.82.23.0904210949211413159200
3P05346Velachery185513-03-2010143.02.05FamilyNo18-03-1988OthersNoSewrPavedI4.73.93.64.010356321770429630290
4P06210Karapakkam122605-10-2009841.01.03AbNormalYes13-10-1979OthersAllPubGravelC3.02.54.13.290237000740637406250
5P00219Chrompet122011-09-2014362.01.04PartialNo12-09-2009CommercialNoSeWaNo AccessRH4.52.63.13.32040902719831612394750
6P09105Chrompet116705-04-20071371.01.03PartialNo12-04-1979OtherAllPubNo AccessRL3.62.12.52.670263152339558488790
7P09679Velachery184713-03-20061763.02.05FamilyNo15-03-1996CommercialAllPubGravelRM2.44.52.13.26060480923520416800250
8P03377Chrompet77106-04-20111751.01.02AdjLandNo14-04-1977OthersNoSewrPavedRM2.93.74.03.550257578332368308970
9P09623Velachery163522-06-2006742.01.04AbNormalNo26-06-1991OthersELONo AccessI3.13.13.33.1603233461212558083650

Last rows

PRT_IDAREAINT_SQFTDATE_SALEDIST_MAINROADN_BEDROOMN_BATHROOMN_ROOMSALE_CONDPARK_FACILDATE_BUILDBUILDTYPEUTILITY_AVAILSTREETMZZONEQS_ROOMSQS_BATHROOMQS_BEDROOMQS_OVERALLREG_FEECOMMISSALES_PRICE
7099P03828Adyar89505-01-20111971.01.03AdjLandYes15-01-1971HouseNoSewrNo AccessI3.64.74.24.1225064173727371800
7100P05438T Nagar173324-02-20101911.01.04AbNormalYes02-03-1985CommercialNoSeWaNo AccessRL3.43.72.12.8970205831202619501600
7101P05042Karapakkam66611-05-2010511.01.02AdjLandYes20-05-1974OthersELOGravelI3.24.42.53.28273317745416211750
7102P05560Karapakkam70103-02-20101001.01.02AbNormalNo08-02-1990HouseNoSeWaGravelRH4.23.02.02.962821751410885643500
7103P05133Karapakkam146223-04-2010682.02.04FamilyNo29-04-1986OthersNoSeWaGravelRM2.73.33.63.243567161783589387250
7104P03834Karapakkam59803-01-2011511.01.02AdjLandNo15-01-1962OthersELONo AccessRM3.02.22.42.522087671070605353000
7105P10000Velachery189708-04-2004523.02.05FamilyYes11-04-1995OthersNoSeWaNo AccessRH3.64.53.33.9234619120555110818480
7106P09594Velachery161425-08-20061522.01.04Normal SaleNo01-09-1978HouseNoSeWaGravelI4.34.22.93.843173541670288351410
7107P06508Karapakkam78703-08-2009401.01.02PartialYes11-08-1977CommercialELOPavedRL4.63.84.14.164253501190988507000
7108P09794Velachery189613-07-20051563.02.05PartialYes24-07-1961OthersELOPavedI3.13.54.33.64349177798129976480